Automated data processing and ingestion

Often, customers have a lot of data to ingest into the HxGN Smart Sites system. Some of this data requires processing.

To support such activities, the HxGN Smart Sites installation includes Apache NiFi. NiFi allows the creation of automated data flows, which for example take data from a source location, perform processing, and then make the result available in HxGN Smart Sites.

The details of a data flow depends on the customer's environment, as well as the type and quantity of the data. For this reason, we do not provide pre-built data flows with HxGN Smart Sites.

However, we have created some example data flows that are available on request. The examples we currently have available are:

Processing flow: automated conversion of BINZ datasets, as produced by Smart Interop Publisher (not part of HxGN Smart Sites), to an OGC 3D Tiles dataset and a corresponding metadata database. The converters used in this data flow are provided with HxGN Smart Sites in the form of command line applications (hss-bim-converter and hss-binz-metadata-converter).
Publishing flow: automated publishing of OGC 3D Tiles dataset and corresponding metadata database in HxGN Smart Sites. This can also be accomplished by manual configuration in Fusion Studio and the Admin application.

note

If you are interested in setting up these or other data flows, please contact the HxGN Smart Sites support team. We can provide assistance or training related.

Processing flow

Data flow

The input of the processing flow is

3D models in the binz format
and their metadata in the . mdb2 format

The input data should be place in a directory to which we will refer as the INBOX. The location of the INBOX depends on the configuration. It should follow the structure below:

INBOX
├─ Cluster_A
│  ├─ Block_100.binz
│  ├─ Block_100.mdb2
│  ├─ Block_206.binz
│  ├─ Block_206.mdb2
│  ├─ ...
│  └─ directory.prj
├─ Cluster_B
│  └─ ...
└─ ...

The first level of directories correspond to a group of data.

Within a group, the .binz files are combined into a single set of OGC 3D Tiles and the .mdb2 files are combined into one gpkg file.

note

During the conversion of a group, the flow will create a temporary directory. After the conversion, the flow removes this directory.

For each group, the flow creates a .zip archive in the OUTBOX directory. The archive contains the converted 3D model and converted metadata.

Monitoring

The flow will react to changes in the INBOX, both when file timestamps change or when new data is added.

Publishing flow

Data flow

The input of the publishing flow is a .zip archive that respects the file structure of those generated by the processing flow. Archives should be placed in a directory to which we refer as the INBOX. The location of the INBOX depends on the configuration.

The flow will:

unpack the archives residing in the INBOX.
move the OGC 3D Tiles to the file-server by means of the sftp-server.
move the .gpkg to the data root of fusion-server by means of the sftp-server.
create or update a WFS service in fusion-server from the uploaded .gpkg.
create the corresponding Datasource and Publication in admin-server. The flow uses the name of the archive to assign a title to the Datasource and Publication.

note

When this flow creates data, products and services in fusion-server, it assigns the keyword "NIFI" to allow you to easily find them.

The flow prefers to reuse existing data, product and services over creating new ones. Reuse is done based on the data path, product name, or service type and name. For an item to be reused, it must carry the keyword assigned by the flow. This prevents overwriting of items you have configured already, but allows you to modify the configuration of items created by the flow.

note

The flow prefers to reuse an existing Datasource and Publication over creating new ones in admin-server. Reuse is done when the title, protocol and URL match for the Datasource, or when the title and protocol match for the Publication. As such, you can modify the configuration of either and the data flow will leave your modification in place.

When this flow does create a new Publication, it assigns an initial set of roles. This set is configurable in the flow.

Monitoring

The flow will react to changes in the INBOX, both when file timestamps change or when new data is added.

Combining the processing and publishing flow

Although both flows are designed to be independent, they can be easily combined if the processing flow OUTBOX is also the INBOX of the publishing flow.

With such a configuration,

the data changes in the INBOX of the processing flow.
the processing flow start converting the data.
writes the archive in its OUTBOX
the publishing flow detects changes in its INBOX
the publishing flow updates the data in file-server and fusion-server

Processing flow​

Data flow​

Monitoring​

Publishing flow​

Data flow​

Monitoring​

Combining the processing and publishing flow​

Processing flow

Data flow

Monitoring

Publishing flow

Data flow

Monitoring

Combining the processing and publishing flow